Improving NMF clustering by leveraging contextual relationships among words
نویسندگان
چکیده
Non-negative Matrix Factorization (NMF) and its variants have been successfully used for clustering text documents. However, NMF approaches like other models do not explicitly account the contextual dependencies between words. To remedy this limitation, we draw inspiration from neural word embedding posit that words frequently co-occur within same context (e.g., sentence or document) are likely related to each in some semantic aspect. We then propose jointly factorize document-word word-word co-occurrence matrices. The decomposition of latter matrix encourages co-occurring similar latent representations thereby reflecting relationships among them. Empirical results, on several real-world datasets, provide strong support benefits our approach. Our main finding is can drastically improve performance by leveraging explicitly.
منابع مشابه
Refinement of Document Clustering by Using NMF
In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cu...
متن کاملImproving Newsgroup Clustering by Filtering Author-Specific Words
Introduction. This paper describes the first step in a project for topic identification in help-desk applications. In this step, we apply a clustering mechanism to identify the topics of newsgroup discussions. We have used newsgroup discussions as our testbed, as they provide a good approximation to our target application, while obviating the need for manual tagging of topics. We have found tha...
متن کاملKEY WORDS-Statistical Parsing, Grammar Acquisition, Clustering Analysis, Local Contextual
This paper proposes a new method for learning a context-sensitive conditional probability context-free grammar from an unlabeled bracketed corpus based on clustering analysis and describes a natural language parsing model which uses a probability-based scoring function of the grammar to rank parses of a sentence. By grouping brackets in a corpus into a number of similar bracket groups based on ...
متن کاملShifted Nmf with Group Sparsity for Clustering Nmf Basis Functions
Recently, Non-negative Matrix Factorisation (NMF) has found application in separation of individual sound sources. NMF decomposes the spectrogram of an audio mixture into an additive parts based representation where the parts typically correspond to individual notes or chords. However, there is a need to cluster the NMF basis functions to their sources. Although, many attempts have been made to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neurocomputing
سال: 2022
ISSN: ['0925-2312', '1872-8286']
DOI: https://doi.org/10.1016/j.neucom.2022.04.122